Skip to content

ENG-2787: Add PBAC evaluation service to fides OSS#7741

Merged
galvana merged 7 commits intomainfrom
feat/pbac-oss-evaluation
Mar 26, 2026
Merged

ENG-2787: Add PBAC evaluation service to fides OSS#7741
galvana merged 7 commits intomainfrom
feat/pbac-oss-evaluation

Conversation

@galvana
Copy link
Contributor

@galvana galvana commented Mar 24, 2026

Ticket ENG-2787

Description Of Changes

Adds the open-source PBAC (Purpose-Based Access Control) evaluation service to fides. This defines the service boundary that fidesplus delegates to for all PBAC evaluation — purpose-based checks, identity resolution, and the Policy v2 evaluator interface.

The entire evaluation stack is OSS:

  • PBACEvaluationService Protocol as the service boundary (input: RawQueryLogEntry, output: EvaluationResult)
  • InProcessPBACEvaluationService with identity resolution, purpose overlap engine, and policy filtering
  • Generic SQL parser (sqlglot) to extract table references from SQL queries
  • AccessPolicyEvaluator Protocol with NoOpPolicyEvaluator default and implementation guide for Policy v2
  • Redis-backed consumer and purpose repositories
  • BasicIdentityResolver for OSS (email, external_id, members list matching)

Fidesplus adds platform connectors (BigQuery, Snowflake) and the access control dashboard on top.

Dependencies:

  • fidesplus#3247 — PBAC enforcement PR (consumes this service boundary)

Code Changes

  • src/fides/service/pbac/types.py — All shared types: RawQueryLogEntry, TableRef, ConsumerPurposes, DatasetPurposes, QueryAccess, Violation, ValidationResult, ResolvedConsumer, EvaluationViolation, EvaluationResult
  • src/fides/service/pbac/evaluate.py — Zero-dependency PBAC engine (purpose overlap check)
  • src/fides/service/pbac/reason.py — Human-readable violation reason generation
  • src/fides/service/pbac/service.pyPBACEvaluationService Protocol + InProcessPBACEvaluationService
  • src/fides/service/pbac/sql_parser.py — Generic SQL -> RawQueryLogEntry via sqlglot
  • src/fides/service/pbac/identity/IdentityResolver Protocol, BasicIdentityResolver, RedisIdentityResolver
  • src/fides/service/pbac/policies/AccessPolicyEvaluator Protocol, NoOpPolicyEvaluator, implementation guide
  • src/fides/service/pbac/consumers/DataConsumerEntity + DataConsumerRedisRepository
  • src/fides/service/pbac/purposes/DataPurposeEntity + DataPurposeRedisRepository
  • src/fides/service/pbac/redis_repository.py — Generic RedisRepository[T] base class
  • src/fides/service/pbac/exceptions.py — Domain exceptions
  • src/fides/service/pbac/engine/ — Backward-compat re-export shim
  • src/fides/service/pbac/evaluation/ — Backward-compat re-export shim
  • tests/service/pbac/policies/ — Unit tests for interface types, NoOp evaluator, and violation filtering

Steps to Confirm

  1. Verify types are importable without Redis/Celery deps: from fides.service.pbac.types import EvaluationResult
  2. Verify backward-compat shims: from fides.service.pbac.engine import evaluate_access
  3. Run unit tests: pytest tests/service/pbac/
  4. Start fidesplus dev with fides-pkg flag and run python demo/pbac_demo.py — all 7 steps should pass

Pre-Merge Checklist

  • Issue requirements met
  • All CI pipelines succeeded
  • CHANGELOG.md updated
    • Updates unreleased work already in Changelog, no new entry necessary
  • UX feedback:
    • No UX review needed
  • Followup issues:
    • Followup issues created
  • Database migrations:
    • No migrations
  • Documentation:
    • Documentation issue created in fidesdocs
    • If there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry

🤖 Generated with Claude Code

@galvana galvana requested a review from a team as a code owner March 24, 2026 06:45
@galvana galvana requested review from erosselli and removed request for a team March 24, 2026 06:45
@vercel
Copy link
Contributor

vercel bot commented Mar 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
fides-plus-nightly Ready Ready Preview, Comment Mar 25, 2026 11:56pm
1 Skipped Deployment
Project Deployment Actions Updated (UTC)
fides-privacy-center Ignored Ignored Mar 25, 2026 11:56pm

Request Review

@galvana galvana requested review from thabofletcher and removed request for erosselli March 24, 2026 06:46
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 24, 2026

Greptile Summary

This PR introduces the open-source PBAC (Purpose-Based Access Control) evaluation service to fides, providing a PBACEvaluationService Protocol, an InProcessPBACEvaluationService implementation backed by Redis, a generic SQL parser using sqlglot, and an AccessPolicyEvaluator extension point for Policy v2 integration.

The architecture is well-structured — pure types in types.py, a zero-dependency evaluation engine in evaluate.py, Redis-backed repositories with pipeline-based saves and secondary index management, and clean Protocol boundaries. However, there is one significant functional gap and several style issues:

  • Incomplete dataset purpose lookup (_build_dataset_purposes): The method always returns empty purpose sets for every dataset key, causing the evaluation engine's Rule 3 ("no declared purposes = no restrictions") to always apply. Registered consumers with any declared purpose will always be marked compliant, regardless of the dataset's actual allowed purposes — PBAC is only enforced against consumers with zero purposes. This makes the service functionally incomplete until the Fides dataset catalog lookup is wired in, and the gap is not currently documented anywhere in the code.
  • Imports inside functions: import re in sql_parser.py and from fides.api.util.cache import get_cache in identity/resolver.py should be moved to module-level per project convention.
  • Short variable names: dk and v used as loop variables in service.py violate the project's full-name convention.
  • Hardcoded magic string: "purpose_restriction" in service.py should be defined as a named constant.
  • PR size: This PR touches 30 files and adds 2 791 lines, exceeding the 500-line / 15-file threshold defined in the project rules.

Confidence Score: 3/5

  • The service scaffolding is solid, but the dataset purpose lookup is a stub that silently disables the core enforcement path for registered consumers, which should be addressed or prominently documented before merging.
  • The structural design (Protocols, Redis repositories, pure evaluation engine) is sound and well-tested at the policy layer. However, _build_dataset_purposes always returns empty purposes, meaning the PBAC check is effectively skipped for any consumer that has at least one declared purpose — only unregistered consumers (no purposes) will receive violations. This is the primary reason the service exists, so shipping without either a working catalog lookup or prominent documentation of the limitation introduces a real risk of false confidence in enforcement. The style issues (imports inside functions, short variable names, magic strings) are secondary but reinforce that the code needs a cleanup pass before merge.
  • Pay close attention to src/fides/service/pbac/service.py, specifically the _build_dataset_purposes method.

Important Files Changed

Filename Overview
src/fides/service/pbac/service.py Core service implementation; contains _build_dataset_purposes stub that always returns empty purpose sets, silently disabling purpose-mismatch enforcement for registered consumers. Also has short variable names (dk, v) and a hardcoded magic string.
src/fides/service/pbac/evaluate.py Pure evaluation engine with zero external dependencies; logic is clean and well-commented. Rule 1 always produces a violation for consumers with no purposes even when the dataset has no declared purposes — an intentional but noteworthy asymmetry.
src/fides/service/pbac/sql_parser.py SQL parser using sqlglot; import re is placed inside detect_statement_type instead of at the top of the module.
src/fides/service/pbac/identity/resolver.py Identity and dataset resolvers; build_identity_resolver factory contains a local import that should be moved to the module top-level.
src/fides/service/pbac/redis_repository.py Generic Redis-backed repository base class; well-structured with pipeline-based save, index management, and pagination. No issues found.
src/fides/service/pbac/consumers/repository.py DataConsumer Redis repository with full CRUD and purpose assignment; implementation looks correct and properly delegates to base class.
src/fides/service/pbac/types.py All shared PBAC types; well-defined dataclasses with zero external dependencies. Clean contract boundary between components.
src/fides/service/pbac/policies/interface.py Policy evaluator Protocol and supporting types; well-designed extension point for Policy v2.
src/fides/service/pbac/policies/noop.py No-op policy evaluator default; simple and correct placeholder implementation.
tests/service/pbac/policies/test_policy_filtering.py Tests for policy violation filtering; good coverage of allow/deny/noop evaluator paths. Missing test cases for core evaluate_access engine and SQL parser.
tests/service/pbac/policies/test_interface.py Tests for policy interface types and protocol compliance; complete and clean.
tests/service/pbac/policies/test_noop.py Tests for the no-op evaluator; straightforward and appropriate.

Reviews (1): Last reviewed commit: "ENG-2787: Add PBAC evaluation service to..." | Re-trigger Greptile


def detect_statement_type(query_text: str) -> str:
"""Detect the SQL statement type from the query text."""
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Import inside function body

import re is placed inside detect_statement_type rather than at the top of the module. This violates the project's import placement rule.

Suggested change
import re
import re
from __future__ import annotations
from datetime import datetime, timezone
from uuid import uuid4
import sqlglot
import sqlglot.expressions as exp

Actually, since from __future__ import annotations must be the first statement, the import re should be placed after the existing imports at the top of the module (after line 13).

Rule Used: Python imports should always be placed at the top ... (source)

Learnt From
ethyca/fides#6536

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — moved import re to module-level imports.

purpose_repo: DataPurposeRedisRepository,
) -> RedisIdentityResolver:
"""Factory to create a RedisIdentityResolver with proper dependencies."""
from fides.api.util.cache import get_cache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Import inside function body

from fides.api.util.cache import get_cache is placed inside build_identity_resolver rather than at the top of the module. This import should be moved to the top-level imports section.

Suggested change
from fides.api.util.cache import get_cache
from fides.api.util.cache import FidesopsRedis, get_cache

Then update the function signature to remove the local import.

Rule Used: Move imports to the top of the module rather than ... (source)

Learnt From
ethyca/fidesplus#2441

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — moved from fides.api.util.cache import get_cache to module-level imports.

dataset_purposes=tuple(sorted(v.dataset_purposes)),
data_use=data_use,
reason=v.reason,
control="purpose_restriction",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded magic string for control type

The string "purpose_restriction" is a magic string used as a control type identifier without a named constant. Per the project convention, define this as a named constant with an explanatory comment.

Suggested change
control="purpose_restriction",
# Control identifier for purpose-based access restriction violations
CONTROL_PURPOSE_RESTRICTION = "purpose_restriction"

Then reference it via control=CONTROL_PURPOSE_RESTRICTION at the call site.

Rule Used: Define magic strings as named variables with expla... (source)

Learnt From
ethyca/fidesplus#2515

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — extracted "purpose_restriction" to a PBAC_CONTROL_TYPE module-level constant.

Comment on lines +127 to +131
dk: DatasetPurposes(
dataset_key=dk,
purpose_keys=frozenset(dataset_purpose_overrides.get(dk, [])),
)
for dk in dataset_keys
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Single-character and abbreviated variable names

Variables v (line 265) and dk (lines 127, 130, 131, 190, 191) are 1–2 character names that should use full descriptive names per the project's naming convention. Consider renaming dk to dataset_key and v to violation throughout these methods.

Rule Used: Use full names for variables, not 1 to 2 character... (source)

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — renamed dk to dataset_key and v/pk to violation/purpose_key throughout.

Comment on lines +184 to +195
def _build_dataset_purposes(
self,
dataset_keys: list[str],
) -> dict[str, DatasetPurposes]:
result: dict[str, DatasetPurposes] = {}
for dk in dataset_keys:
result[dk] = DatasetPurposes(
dataset_key=dk,
purpose_keys=frozenset(),
collection_purposes={},
)
return result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Incomplete dataset purpose lookup causes silent non-enforcement

This method populates every dataset with an empty purpose_keys frozenset. Per Rule 3 in the evaluation engine ("if a dataset has no declared purposes, access is compliant"), this means all consumers with at least one declared purpose will always receive a compliant result, regardless of the actual dataset's allowed purposes.

Concretely: a consumer with purpose "marketing" accessing a dataset annotated for "billing" will not produce a violation, because the lookup returns empty rather than frozenset({"billing"}).

Violations are only generated today for consumers with zero declared purposes (Rule 1). Registered consumers with purposes bypass all checks unless dataset_purpose_overrides is explicitly provided on every call.

The intended implementation should fetch declared purposes from the Fides dataset catalog (fideslang annotations on datasets, collections, and fields). Please add a clear docstring or inline comment documenting this limitation so callers are not misled into thinking enforcement is active.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — _build_dataset_purposes now uses a pre-built dataset_purposes map passed at init time, so purpose-overlap enforcement works in the OSS path. Added both unit and integration tests.

Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review: PBAC Evaluation Service

Good foundational work — the layered design (pure engine in evaluate.py, service boundary as a Protocol, no-op policy evaluator as a default) is clean and the module structure is well-thought-out. A few issues need attention before this lands.


Critical (Must Fix)

1. sqlglot not declared as a dependency (sql_parser.py:12-13)
sqlglot is imported at module level but is absent from pyproject.toml. This will cause an ImportError in any standard install. Needs to be added to the dependency manifest.

2. _build_dataset_purposes always returns empty purpose sets (service.py:184-195)
This is the most significant functional gap. Because every dataset resolves to purpose_keys=frozenset(), Rule 3 in the engine ("dataset with no purposes → compliant") fires for every dataset access by any consumer that has purposes. In practice, PBAC violations are only generated for Rule 1 (consumer with zero purposes). The core purpose-overlap check — the whole point of this service — is a no-op in the default path.

If fidesplus is expected to always pass dataset_purpose_overrides, that contract needs to be documented clearly and the parameter should arguably be required (or the method overridable). As-is, a caller that omits the override silently gets useless results.


Suggestions

3. Silent failure on SQL parse errors (sql_parser.py:48-49)
except Exception: return refs means parse failures are invisible. Downstream, the query looks compliant (no tables referenced). Add at minimum a logger.warning so operators know when queries couldn't be parsed.

4. import re inside function body (sql_parser.py:70)
re is stdlib and should be imported at module level with the other imports.

5. Read-then-write race condition in save() (redis_repository.py:74-85)
The existing entity is fetched outside the pipeline; index cleanup and new writes go inside it. A concurrent writer between the get and pipe.execute() can leave secondary indexes stale. This may be acceptable given expected write concurrency, but should be documented if so.

6. list_page loads all PKs into memory (redis_repository.py:173-174)
smembers materializes the full PK set before slicing. For large collections this will be slow. Worth noting as a known limitation.

7. _validate_purpose_keys raises ValueError (consumers/repository.py:207-210)
All other error paths raise from fides.service.pbac.exceptions. A ValueError breaks that convention — route handlers and callers that catch domain exceptions will miss this. Use ResourceNotFoundError("DataPurpose", fk) instead.

8. Members-based resolution undocumented gap (identity/resolver.py:21-67)
The README and BasicIdentityResolver both describe a three-step resolution chain including members-list lookup. RedisIdentityResolver only implements two steps, and the consumer repository doesn't maintain a members index. The README's "Resolution chain" section should reflect what's actually implemented in the OSS path.


Nice to Have

9. Rule 1 / Rule 3 asymmetry (evaluate.py:91-104)
A consumer with no purposes accessing a dataset with no purposes = violation. A consumer with purposes accessing a dataset with no purposes = compliant. The asymmetry is likely intentional but subtle enough to deserve a comment.

10. Test filtering logic duplicated from production (tests/service/pbac/policies/test_policy_filtering.py:115-136)
_filter_violations replicates the service's private method body. If the service implementation changes, these tests won't catch the regression. Consider testing through the actual service class.

11. Missing test coverage for core modules
evaluate.py, sql_parser.py, redis_repository.py, BasicIdentityResolver, and the end-to-end InProcessPBACEvaluationService.evaluate() path have no tests. The policy interface tests are thorough, but the engine logic and identity resolution are untested.

from uuid import uuid4

import sqlglot
import sqlglot.expressions as exp
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: sqlglot is not declared as a dependency.

sqlglot is imported at the top of this file but does not appear in pyproject.toml. Any environment that installs fides without it will get an ImportError at module load time — not a lazy failure.

Add sqlglot to the appropriate dependency group in pyproject.toml before merging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — added sqlglot~=30.0.3 to pyproject.toml dependencies and removed the type: ignore[import-not-found] comments from the imports.

try:
parsed = sqlglot.parse(query_text)
except Exception:
return refs
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: silent failure on parse errors loses observability.

A bare except Exception: return refs means malformed SQL silently produces an empty table-reference list. Downstream, this will look like a compliant query (no datasets accessed) rather than an evaluation error, making bugs very hard to diagnose.

At minimum, add a logger.debug or logger.warning here so operators can see when queries fail to parse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — added logger.warning with exc_info=True so parse failures are visible in logs instead of silently returning an empty list.


def detect_statement_type(query_text: str) -> str:
"""Detect the SQL statement type from the query text."""
import re
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: import re at function scope instead of module level.

re is a stdlib module and should be imported at the top of the file alongside other imports. Importing inside a function is unconventional and mildly confusing — it implies a lazy-import rationale that doesn't apply here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — moved import re to module-level imports.

purpose_keys=frozenset(),
collection_purposes={},
)
return result
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical: _build_dataset_purposes always returns empty purpose sets — PBAC violations will never be produced for consumers that have purposes.

By Rule 3 in evaluate.py: "If a dataset has NO declared purposes, access is considered compliant." This method always returns purpose_keys=frozenset() for every dataset, so unless dataset_purpose_overrides is passed by the caller, every consumer-with-purposes will always get a clean result regardless of what the dataset is.

The only case that still produces violations is Rule 1 (consumer has no purposes at all). The core PBAC purpose-overlap check (Rule 2) is effectively dead code in the default evaluation path.

If fidesplus is expected to supply purpose data via dataset_purpose_overrides, that assumption should be documented prominently here — and ideally the method should be overridable or accept a dataset-purpose lookup callable so the OSS service itself is useful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — _build_dataset_purposes now uses a pre-built dataset_purposes map passed at init time. The caller is responsible for building the map from the dataset catalog (including collection-level and field/sub-field-level purposes). Keys not in the map fall back to empty purposes. Added unit tests for evaluate_access() covering all three rules plus collection inheritance, and integration tests for InProcessPBACEvaluationService.evaluate() covering registered/unregistered consumers, the init-time map, overrides precedence, and backward compat.

pipe.sadd(self._all_key(), pk)
for field, value in self._get_index_entries(entity):
pipe.sadd(self._index_key(field, value), pk)
pipe.execute()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: read-then-write race condition in index maintenance.

The pattern here reads the existing entity with self._cache.get(key) (line 74) outside the pipeline, then queues the stale-index cleanup and new-index writes inside the pipeline. Between the get and pipe.execute(), another writer can mutate or delete the entity, leaving indexes in an inconsistent state.

Redis doesn't support optimistic locking via WATCH/MULTI/EXEC natively through a pipeline, but using WATCH + a MULTI/EXEC block would make this safe. Alternatively, document the known race as acceptable given the expected write throughput for this use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — this is a known limitation at current write concurrency levels. The race window is narrow (single Redis round-trip) and the consequence is stale secondary indexes, not data loss. Will document and revisit if concurrent writes become a real pattern.

"""
try:
pks = self._resolve_pks(filters)
sorted_pks = sorted(pks) if pks else []
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: list_page loads all PKs into memory to paginate.

_resolve_pks calls smembers on the :_all key (or performs a sinter), loading every PK in the set into a Python list before slicing. For large consumer/purpose sets this will be slow and memory-intensive. Redis SSCAN with cursor-based iteration would avoid materializing the full set, though it makes consistent ordering harder. Worth noting as a known limitation at this stage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged as a known limitation. At current expected collection sizes this is acceptable. Will revisit with cursor-based pagination if collections grow large enough to cause memory pressure.

"""Check that all purpose fides_keys exist in the purpose repo."""
for fk in fides_keys:
if not self._purpose_repo.exists(fk):
raise ValueError(f"DataPurpose '{fk}' not found")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: _validate_purpose_keys raises ValueError instead of a domain exception.

Every other error path in this repository raises ResourceNotFoundError or ResourceAlreadyExistsError from fides.service.pbac.exceptions. A ValueError here breaks that pattern — callers (and route handlers) that catch domain exceptions will miss this. Consider raising ResourceNotFoundError("DataPurpose", fk) for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — replaced ValueError with ResourceNotFoundError("DataPurpose", fk) for consistency with other error paths.

"""Resolve platform table references to Fides dataset fides_keys.

Uses a configurable mapping strategy:
- Direct match: ``{dataset}.{table}`` → Fides dataset ``fides_key``
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: members-based resolution is documented but not wired up.

The README and BasicIdentityResolver both document a three-step resolution chain: contact_emailexternal_idmembers list. RedisIdentityResolver only implements the first two. The members index is not maintained in DataConsumerRedisRepository._get_index_entries, so there is no way to find a consumer by member email via Redis.

Either remove the members resolution step from the README's "Resolution chain" section for the OSS path, or add a members index to the consumer repository and implement the lookup here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — updated README to note that members-based resolution is not yet implemented in the OSS path. The resolution chain section now reflects what RedisIdentityResolver actually supports (contact_email and external_id only).

consumer_purposes=consumer.purpose_keys,
dataset_purposes=effective,
reason="Consumer has no declared purposes",
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to have: Rule 1 asymmetry with Rule 3 — consider documenting intent.

Rule 1 fires a violation when a consumer has no purposes, even when the dataset also has no declared purposes (effective == frozenset()). Rule 3, by contrast, says that a consumer with purposes accessing a dataset with no purposes is compliant.

This asymmetry means "consumer with no purposes + dataset with no purposes" = violation, while "consumer with purposes + dataset with no purposes" = compliant. That may be intentional (a consumer must declare something to access any dataset), but it's subtle enough to warrant a comment in the code or the docstring explaining why.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done — added a comment explaining the intentional asymmetry: an unregistered consumer (no purposes) should never silently pass evaluation, even against an unrestricted dataset.

result = _make_validation_result(violations=violations)
evaluator = AllowSpecificDatasetEvaluator(allowed_datasets={"ds_allowed"})
filtered = _filter_violations(result, evaluator)
assert len(filtered.violations) == 1
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: the filtering logic under test is duplicated from production code.

_filter_violations replicates the body of InProcessPBACEvaluationService._filter_violations_through_policies verbatim. If the production implementation changes, these tests will silently continue to pass against the copy rather than the real code.

Consider testing the method on the actual service instance (with a test double for the cache), or at minimum extract the filtering function to a standalone helper that both the service and the tests import.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — deferring this refactor to a follow-up. The duplicated logic is a test isolation choice; will revisit when the service interface stabilizes.

Copy link
Contributor

@kruulik kruulik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have anything to add beyond what Claude has already commented - approving in advance to unblock.

Adrian Galvan and others added 3 commits March 25, 2026 15:26
Purpose-Based Access Control (PBAC) evaluation engine and service
boundary for open-source use. Evaluates whether data consumers have
the declared purposes required to access datasets.

Components:
- types.py: Standardized input/output types (RawQueryLogEntry, EvaluationResult)
- evaluate.py: Zero-dependency PBAC engine (purpose overlap check)
- service.py: PBACEvaluationService Protocol + InProcessPBACEvaluationService
- sql_parser.py: Generic SQL -> RawQueryLogEntry via sqlglot
- identity/: IdentityResolver Protocol + BasicIdentityResolver + RedisIdentityResolver
- policies/: AccessPolicyEvaluator Protocol + NoOpPolicyEvaluator + implementation guide
- consumers/, purposes/: Redis-backed entity storage
- README.md: OSS user flow documentation

The evaluation service accepts a RawQueryLogEntry and returns a flat
EvaluationResult. Fidesplus adds platform connectors (BigQuery, Snowflake)
and the access control dashboard on top.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add type: ignore comments for imports that reference modules not yet
available in fides OSS (data_purpose, data_consumer, sqlglot). Add
type annotation for the collections variable in evaluate.py. Add
changelog entry for this PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Move type: ignore[import-not-found] to the from-line so mypy respects
the suppression. Use empty tuple default for collections to match the
dict[str, tuple[str, ...]] return type. Update changelog 7700 PR
number to 7741 to match this PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@galvana galvana force-pushed the feat/pbac-oss-evaluation branch from c4da20e to f6ac648 Compare March 25, 2026 22:27
- Add sqlglot~=30.0.3 to pyproject.toml dependencies
- Add logger.warning on SQL parse failures for observability
- Move import re and get_cache to module-level imports
- Wire dataset purposes through init-time map instead of empty stub
- Rename short variables (dk, v, pk) to descriptive names
- Extract PBAC_CONTROL_TYPE constant for magic string
- Replace ValueError with ResourceNotFoundError in purpose validation
- Update README to reflect actual OSS resolution chain (no members yet)
- Add comment explaining Rule 1/3 asymmetry in evaluate engine
- Add unit tests for evaluate_access and integration tests for the service

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions
Copy link

github-actions bot commented Mar 25, 2026

Dependency Review

The following issues were found:
  • ✅ 0 vulnerable package(s)
See the Details below.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 98cd1ec.
Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

  • uv.lock

@galvana galvana removed the request for review from speaker-ender March 25, 2026 23:29
@galvana galvana enabled auto-merge March 25, 2026 23:30
@galvana galvana added this pull request to the merge queue Mar 25, 2026
@galvana galvana removed this pull request from the merge queue due to a manual request Mar 25, 2026
@galvana galvana enabled auto-merge March 25, 2026 23:52
@galvana galvana added this pull request to the merge queue Mar 26, 2026
Merged via the queue into main with commit 6138f2c Mar 26, 2026
62 of 63 checks passed
@galvana galvana deleted the feat/pbac-oss-evaluation branch March 26, 2026 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants